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Transformed Schatten-1 Iterative Thresholding 
Algorithms for Low Rank Matrix Completion 

Shuai Zhang, Penghang Yin, and Jack Xin 


Abstract 

We study a non-convex low-rank promoting penalty function, the transformed Schatten-1 (TSl), and its 
applications in matrix completion. The TSl penalty, as a matrix quasi-norm defined on its singular values, 
interpolates the rank and the nuclear norm through a nonnegative parameter aG(0,-|-oo). We consider the 
unconstrained TSl regularized low-rank matrix recovery problem and develop a fixed point representation for 
its global minimizer. The TSl thresholding functions are in closed analytical form for all parameter values. The 
TSl threshold values differ in subcritical (supercritical) parameter regime where the TSl threshold functions are 
continuous (discontinuous). We propose TSl iterative thresholding algorithms and compare them with some state-of- 
the-art algorithms on matrix completion test problems. For problems with known rank, a fully adaptive TSl iterative 
thresholding algorithm consistently performs the best under different conditions, where ground truth matrices are 
generated by multivariate Gaussian, (0,1) uniform and Chi-square distributions. For problems with unknown rank, 

TSl algorithms with an additional rank estimation procedure approach the level of IRucL-g which is an iterative 
reweighted algorithm, non-convex in nature and best in performance. 

Keywords: Transformed Schatten-1 penalty, fixed point representation, closed form thresholding 

function, iterative thresholding algorithms, matrix completion. 

AMS Subject Classifications: 90C26, 90C46 


I. Introduction 

Low rank matrix completion problems arise in many applications such as collaborative filtering in 
recommender systems [|4l|, ifTTl . minimum order system and low-dimensional Euclidean embedding in 
control theory [fT4l . ifTSll . network localization ifTSl . and others [l26l . The mathematical problem is: 


min rank(X) s.t. X Eh, 


( 1 . 1 ) 


min rank(A) s.t. Xij = Mij, (i,j)Ehl 

XfZ^mxn 


where L is a convex set. In this paper, we are interested in methods for solving the affine rank minimization 
problem (ARMP) 

min rank(A) s.t. si^{X) = binW, 

where linear transformation : 3?™^” —)■ 3?^ and vector hEW are given. The matrix completion problem 

(1.3) 

is a special case of (11.21) . where X and M are both mxn matrices and is a subset of index pairs 

The optimization problems above are known to be NP-hard. Many alternative penalties have been 
utilized as proxies for finding low rank solutions in both the constrained and unconstrained settings: 

min F(X) s.t. Si^{X)=b (] A) 

and ^ 

(1.5) 
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The penalty funetion F(-) is defined on singular values of matrix X, typieally F{X) = where 

i 

ai is the z-th largest singular value of X arranged in deseending order. The Sehatten p-norm (nuelear norm 
at p=l) results when f{x)=x^, pE [0,1]. At p = 0 (p = 2), F is the rank (Frobenius norm). Reeovering 
rank under suitable eonditions for p G (0,1] has been extensively studied in theories and algorithms ll2l, llTl, 
[31, liT9l . [l20l . [l2n . [|23l, dill, [12^ . Non-eonvex penalty based methods have shown better performanee 
on hard problems [|20l, [|2^ . There is also a novel method to solve the constrained problem (11.41) . from 
the perspective of gauge dual [|33, [|33l. 

Recently, a class of ii based non-eonvex penalty, the transformed ii (TLl), has been found effective and 
robust for compressed sensing problems [[30l . [1311 . TLl interpolates io and ii, similar to tp quasi-norm 
(pG (0,1)). In the entire range of interpolation parameter, TLl enjoys closed form iterative thresholding 
function, which is available for ip only at some specific values, like p = 0,1,1/2,2/3, see [HI, 0, iTTl , 
[[291 . This feature allows TLl to perform fast and robust sparse minimization in a much wider range than 
Ip quasi-norm. Moreover, the TLl penalty boasts unbiasedness and Lipschitz continuity besides sparsity 

m, m- 

It is the goal of this paper to extend TLl penalty to TSl (transformed Sehatten-1) for low rank matrix 
completion and compare it with state of the art methods in the literature. 

The rest of the paper is organized as follows. In section 2, we present the transformed Schatten-1 function 
(TSl), the TSl regularized minimization problems, and a derivation of thresholding representation of the 
global minimum. In section 3, we propose two thresholding algorithms (TSl-sl and TSl-s2) based on a 
fixed point equation of the global minimum. In section 4, we compare TSl algorithms with some state- 
of-the-art algorithms through numerical experiments in low rank matrix recovery and image inpainting. 
Concluding remarks are in section 5. 

A. Notation 

Here we set the notations for this paper. Two kinds of inner products are used in the following sections, 
one is between matrices and one is a bilinear operation for vectors: 

{x.y) = Y.^iyi foi" vectors x,y] 

(X, Y) =ii{Y^X) = Y. XijYij for matrices X, Y. 
hj 

Assume matrix X G has r positive singular values di > (T 2 > ... > cr^ > 0. Let us introduce some 
common matrix norms or quasi-norms as, 

r 

• Nuclear norm: ||X||* = ^(Ji; 

i=l 

• Sehatten p quasi-norm: ||X||p = 

i=l 

• Frobenius norm: ||X||f = (^cr/)^. Also ||X|||, = (X,X) = ^X/■. 

i=l i,j 

k 

• Ky Fan fc-norm: \\X\\Fk = Y'^i^ for l<k<r; 

i=l 

• Induced norm: ||X|| 2 , 2 = max ||Xu ||2 = ai. 

Ihll2=i 

Define function vec(-) to unfold one matrix columnwise into a vector. So it is clearly that ||vec(A')|h = 
||X||i?, where the left hand side norm is vector’s £2 norm. 

Define the shrinkage identity k matrix as following, 

r = 1, the first k diagonal elements] 

\Fk{iN) = ^, others. ^ 
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Operator trk(-) is defined as the first k partial trace of a matrix, 

k 

= (1-7) 

i=l 

The following matrix functions will be used in the proof of next section, and we want to write them 
out first here for reference: 

C^{X) = \\\s^{X)-h\\l + \T{X)- 

C^{X)-\\\^{X)-s^{Z)\\l } + i\\x-z\\l 

= /iAT(X) + f||6||2-^||^(Z)||i-M^W,6-^(Z)) + i||X-Z||^; ^ • 

Bf,{Z) = Z + fi£/*{b-s^{Z)). 


II. TSl MINIMIZATION AND THRESHOLDING REPRESENTATION 

First, let us introduce Transformed Schatten-1 penalty function(TSl) based on the singular values of a 
matrix: 

rank(X) 

T{X)= Y, (2.9) 

i=l 

where pa{-) is a linear-to-linear rational function with parameter aG (0,oo) [[30l, OTl . 

P.(kl) = ^^. (2.10) 

a+ |a;| 

With the change of parameter a, TLl interpolates Iq and h norms: 


lim p„(a:) = /{a;^o}, lim pa(x) = |x|. 

a—).0+ a—>.+co 

In Fig.fU level lines of TLl on the plane are shown at small and large values of parameter a, resembling 
those of h (at a = 100), I 1/2 (at a = l), and Iq (at a = 0.01). 

We shall focus on TSl regularized problem 

min \\W{X)-h\\l + mxl (2.11) 

where the linear transform sZ : determined by p given matrices Ai^...^ApE that 

is, ^{X) = {{A,,X),...,{Ap,X)f. 


A. Overview of TLl minimization 

To set the stage for the discussion of the TSl regularized problem (12.1 II) . we review the following 
results on one-dimensional TLl optimization fl^ . 

Let us consider the unconstrained TLl regularized problem: 

+ (2.12) 

where matrix vector are given, Pa{x) = Yjpai\xi\) and function paf) is as in (12.101) . 

i 

In this subsection of TLl minimization, we want to overwrite operator B^f) over vector x, instead of 
matrix field as before in (11.81) . 

Bp{x) =x +pA'^{y — Ax). (2.13) 


In the following theorem (III.II) . we prove that there exists a closed form expression for proximal operator 
proxxp^ on univariate TLl regularization problem, where proxxp^{x) = aTgmm^{y — x)'^ + Xpa{y). 

ye?R- 
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Fig. 1: Level lines of TLl with different parameters: a = 100 (figure b), a = l (figure e), a = 0.01 (figure 
d). For large parameter a, the graph looks almost the same as h (figure a). While for small value of a, it 
tends to the axis. 


Proximal operator of a eonvex funetion usually intends to solve a small eonvex regularization problem, 
whieh often admits elosed-form formula or an effleient speeialized numerieal methods. However, for non- 
eonvex funetions, like Ip with p G (0.1), their related proximal operators do not have elosed form solutions 
in general. There are many iterative algorithms to approximate optimal solution. But they need more 
eomputing time and sometimes only eonverge to loeal optimal or stationary point. In this subseetion, we 
prove that for TLl function, there indeed exists a closed-formed formula for its optimal solution. 

Different with other thresholding operators, TLl has 2 threshold value formulas depending on regular 
parameter A and TLl parameter ‘a’. We present them here with same notation as 


t* = X9±l 

^ a 


(sub-critical parameter) 
tl = ^2A(a+1) — I (super-critical parameter). 


(2.14) 


The inequality fg < ^2 holds and the equality is realized if and only if A = 2 {a+i) ■ 
Let sgn(-) be the standard signum function with sgn(O) = 0, and 


see 


h\{x) =sgn(a;) < -(a+ |a;|) cos 


p(x] 


2a |x| 


(2.15) 


with p{x) = arccos(l- "^^{a+ixip general, \hx{x)\<\x\, see JSOl. 


Theorem II.l. (l[3d\I} The optimal solution ofy* 
of the form: 

* 

y = 


= argmin{4(|/ —a;)^ +Apa(|j/|)} is a thresholding function 
y&R 

0, \x\<t 

hx(x), |a:| > t 


(2.16) 
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where hx{-) is defined in ^2.i5D . and the threshold parameter t depends on A as follows: 
2 

1) A < 2 (a+i) (sub-critical and critical), 


2 

2) if \> 2 {a+i) (super-critical), 

t = tl = V'2A(a + l) - 

According to the above theorem, we introduce thresholding operator g\^a(-) in 3^, 


9\a(w) = 


0 , 


if |tu| <t; 


hx(w), if|tc|>t, 


(2.17) 


where t is the thresholding value in Theorem III. 1 1 and hx(-) in (12.151) . 

In QiPl , the authors proved that when A < 2 (^+ 1 ) ’ TLl threshold function is continuous, same as soft- 

thresholding function (H, [|9]|. While if A > 2 [a+i) > TLl thresholding function has a jump discontinuity at 
threshold value, similar to half-thresholding function [|29l . For different threshold scheme, it is believed that 


continuous formula is more stable, while discontinuous formula separates nonzero and trivial coefficients 
more efficiently and sometimes converges faster. 

We have the following representation theorem for TLl regularized problem (12.121) . 


Theorem 11.2. (fSOll) If x* = (x\,X 2 -,---,xf)'^ is a TLl regularized solution ( 12.721) with a and A being 
positive constants, and 0 < p<\\A\\~^, then letting t = t 2 l t^ 1 +f.s7^^ ^ the optimal solution 

satisfies the fixed point equation: 

x*=gx^,,a([B^,(x*)]i) V7 = l,...,n. (2.18) 

In the following, we will extend this result to TSl low rank matrix completion and propose 2 
thresholding algorithms based on it. 


B. TSl thresholding representation theory 

Here we assume m<n. For a matrix X with rank equal to r, its singular values vector a = 

(cri,...,(Tm) is arranged as 

CT ^ ^2 (Ty' 0 - .. . ^TTl • 

The singular value decomposition (SVD) is X = UDV^, where U = (Uij)jnxm and V = (Vij)nxn are 
unitary matrices, with D = Diag(a) diagonal. 

In [[T 3 II . Ky Fan proved the dominance theorem and derive the following Ky Fan k-norm inequality. 

Lemma II.l. (Ky Fan k-norm inequality) For a matrix X G with SVD: X = UD V'^, where diagonal 
elements of D are arranged in decreasing order, we have: 

that is, trk(X) < trk(7A) = || AHirfo V/c = 1,2, ...,m. The inequalities become equalities if and only if X = D. 
Here matrix If. and operator trk(-) are defined in section \FA\ 

Another proof of this inequality without using dominance theorem is available. We leave it in the 
appendix for readers’ convenience, making the paper self-contained. 
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Theorem II.3. For any matrix Y G which admits a singular value decomposition: Y = 

f/Diagfa) where cr = A global minimizer of min i||X —F||p +AT(X) is: 

= Gx,a{Y) = UDiag{gx,a{cT))V^, (2.19) 

where gx,a{-) Is defined in d2.i7D and applied entrywise to a. 


Proof: First due to the unitary invarianee property of Frobenius norm and Y = UY)ia.g{a)V^, we have 
\\\X-Y\\l + XT(X) = \\\U'^XV-n^g(a)\\% + XT(U^XV). 


So 


=argmini||X-F||| + AT(X) 




( 2 . 20 ) 


( 2 . 21 ) 


= (7 j argmini||X - Dm^(a) III + AT(X) 11/^. 

Next we want to show: 

argmini||X-Diag(a)||| + AT(X) 

= argmin|ogjj^x™ diagonal} i 11^ - Diag(a) ||| + AT(D) 

For any X suppose it admits SVD: X = UxDiag{ax)Vj. Denote 

Dx = F)is.g{(Tx) and = Diag(cr). 

m 

We ean rewrite diagonal matrix Dy as Dy = Y^\IaiIf where V(Ti = crj —cTj+i >0 for i = 1,2, ...,m — 1, and 

i 

m _ 

yam = o'm- So simply, ^ V(Ti = crfc. Note the shrinkaged identity matrix // is defined in seetion iFAl 

i=k 


{X, D,) = (A', E Vff./') = E {-’f. Vff./') 

i i 

m 

<Y.{DxX^iI!) = {Dx,Dy), 

i 

where we used Lemma UlT] for the inequality. The equality holds if and only if X = Dx. 
Thus we have 


IIA -D„\\l = IIAIII + IID.IIf - 2{A,r>„> 

— l|-^a:||F + WDyWjp — 2{Dx, Dy) = ||i9. 


n ||2 

X IliT’* 


Also due to T(X) =T{Dx), 

^\\X-Dy\\l + XT{X)>^\\Dx-Dy\\l + XT{Dx). 

Only when X = is a diagonal matrix, the above will beeome equality. So we finish the proof of 
equation (12.211) . 

Denote a diagonal matrix D as D = Diag{d). Then: 



DiagM|||,+ Ar(D) = ^ 

i=l 



-(Ji\\l + Xpa{\di\) 
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By Theorem III. 1 [ we have (7A,a(<7i) = a^ 5 'inin{ ]r\\d —ai\W + \pa{\d\) }>0. It follows that 

d 

argmin|ogjj™x„and d is diagonal} ill^ -Diag(a)|||. + A T{D) 

= argmini||X-Diag(cT)||^ + A T{X) (^21) 

= Diag(^A,a(cr)). 

In view of (12.201) . the matrix = UDiag{gx,a{o'))V^ is a global minimizer, whieh will be denoted as 
GA,a(A^)-The proof is eomplete. ■ 


Lemma II.2. For any fixed A > 0, /i > 0 and matrix Z G let X® = Gx^,a{B^{Z)), then for any matrix 

X e 


C'A,^(X^Z)<C'A,^(X,Z), 


which means X^ is a global minimizer of Cx,ii{X,Z). Here the matrix function Cx,i_i{X,Z) is defined in 
di.^D of section \I-A\ 


Proof: First, we will rewrite the formula of Cx,^i{X,Z). Note that £/(X) and £/(Z) are veetors in 
spaee 3?^. Thus in the formula of Cx,fiX,Z), there exist norms and inner products for both matrices and 
vectors. By definition, 


CxjX,Z) = i||X|||-(X,Z) + A||Z||| + A/iT(X) + f ||6||i 
-p(x/(X),b-^(Z))-§llx/(Z)lll 

= ill^ll^+ill^lll+fll&lli-fK(^)lli 

+ApT(X)-( X,Z + /u£/*(b-^(Z)) } 

= A||X-5,(Z)||^ + A/iT(X) 

-iP.(^)lll+ill^llKfll&lli-fK(^)lli 

Thus if we fix matrix Z, 

arg min (7 a,/, (X, Z) = arg min A 11X - ( Z) 1 + A/iT (X ) 

Then by Theorem III.31 X® is a global minimizer. 


(2.23) 


(2.24) 


Theorem II.4. For fixed parameters, A>0 and 0</i< ||^|| 2 ^- If X* is a global minimizer for problem 
CxiX), then X* is also a global minimizer for problem min (7 au(X,X*), that is 

CxAX*,X*)<CxAX,X*), VXe3fJ™^". 

Proof: 

CxAX.X*) = p{\\\s:F{X)-b\\l + XT{X)} 

+\{\\X-X*\\l-p\\s^{X)-x^{X*)\\l} 

>p{ |K(X)-6||i + AT(X) } = pCx{X) 

>pCx{X*) = CxAXfX*) 

The first inequality is due to the fact: 

K(X)-.5/(X*)||i = ||x4vec(X)-x4vec(X*)||2 

<||A||2 ||vec(X-X*)||2 (2.26) 

<K||i ||X-X*||| 


By the above Theorems and Lemmas, if X* is a global minimizer of (7 a(X), it is also a global minimizer 
of Cx,fi{X,Z) with Z = X*, which has a closed form solution formula. Thus we arrive at the following 
fixed point equation for the global minimizer X*: 

X* = GxUB>.{X*)). 


(2.27) 
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Suppose the SVD for matrix is U'Dia.g{al)V'^, then 

X* = [/Diag((7AM,aK))i/^, 

whieh means that the singular values of X* satisfy a* = for i = 


III. TS 1 THRESHOLDING ALGORITHMS 

Next we will utilize fixed point equation (12.271) to derive two thresholding algorithms for TSl regularized 
problem (12.111) . As in 1(301 . OTl . from the equation X* = Gx^i,a{.B^{X*)) = UY)mg{g\fj,^a{.cr))y'^, we will 
replaee optimal matrix X* with X^ on the left and X^~^ on the right at the fc-th step of iteration as: 


X"=GA;..a(5^(X"-')) 

= t/"-iDiag(^AM,a(cr'=-'))^ 


k-l,T 


(3.28) 


where unitary matriees ^ and singular values {a^ eome from the SVD deeomposition of 

matrix B^{X^~^). Operator gx^i,a{-) is defined in (12.171) . and 


( \ _ / O 5 if |tu| < t; 
~ \ hx^,{w), if \w\>t. 


Recall that the thresholding parameter t is: 


t = 


if A < 


= V2A/i(a+l)-f,if A> 


— 2(a+l)At’ 


(3.29) 


(3.30) 


2{a+l)ii ■ 


With an initial matrix X°, we obtain an iterative algorithm, called TSl iterative thresholding (IT) 
algorithm. It is the basic TSl iterative scheme. Later, two adaptive and more efficient IT algorithms 
(TSl-si and TSl-s2) will be introduced. 


A. Semi-Adaptive Thresholding Algorithm - TSl-si 

We begin with formulating an optimal condition for regularization parameter A, which serves as the 
basis for the parameter selection and updating in this semi-adaptive algorithm. 

Suppose optimal solution matrix X has rank r, by prior knowledge or estimation. Here, we still assume 
m<n. For any p, denote i?^(X) =X-|-/iA^(6 —.e/(X)) and are the m non-negative singular 

values for B^{X). 

Suppose that X* is the optimal solution matrix of (12.111) . and the singular values of matrix B^{X*) 
are denoted as ex* > 0-2 >... >cr))^. Then by the fixed equation (12.271) . the following inequalities hold: 


where t is our threshold value. Recall that T^<t<t 2 . So 


It follows that 


a;>t>t% = x/2Xp{a +1) - f; 
K+,<t<tl = Xp^. 


X.= 




/i(a -f 1) 


^ A ^ A 2 


(a-f 2(7*)^ 
8 (a-|- l)/x 


or A* G [Ai, A 2 ]. 

The above estimate helps to set optimal regularization parameter. A choice of A* is 


A* 




if 

if Ai > ° , 

^ 2(a+l)/i’ 


then A* < ^^ — ^2; 

then A* > 2(a+i)^ ^^ = ^3- 


(3.31) 


(3.32) 


(3.33) 






















^ ( I ^ \ 2 

In practice, we approximate B^{X*) by in (13.331) . so Ai = and A 2 = -77 -—• We 


ehoose optimal parameter A at the n-th step as 


/i(a + l)' 


8(a + l)/i 


a:= 


Ai, if Ai ^ 
A 2 , if Ai > 


2{a+l)^l■' 

o? 

2(a+l)/j, 


(3.34) 


This way, we obtain an adaptive iterative algorithm without pre-setting the regularization parameter 
A. The TLl parameter a is still free and needs to be seleeted beforehand. Thus the algorithm is overall 
semi-adaptive, ealled TSl-sl for short and summarized in Algorithm [TJ 
Algorithm 1: TSl-sl threshold algorithm 

Initialize: Given and parameter /i and a. 
while NOT converged do 

1 . Y^ = B^{X^)=X^-iis2^*{s^{X^)-h), 

and eompute SVD of F" as = UY)i&g{a)V'^ ; 

2. Determine the value for A” by (13.341) . 

then obtain related threshold value by (13.301) : 

3. = (7Diag(^?A>,a(a))F'^; 

Then, n^n + 1. 

end while 


B. Adaptive Thresholding Algorithm - TSl-s2 

Different from TSl-sl where the parameter ’a’ needs to be determined manually, here at eaeh iterative 
step, we ehoose a = an such that equality An = holds. The threshold value t is given by a single 

formula with f = in¬ 
putting A= 2 {a+i)n critieal value, the parameter a is expressed as: 

a = Xp y/(A/i)^-f 2A/i. 

The threshold value is: 

fl-f 1 A/i \/ (A/i)^-|-2A/i 

t = Xa -=-h -. 

^ a 2 2 

Let X* be the TLl optimal solution and a* be the singular values for matrix Then we have 

the following inequalities: 

<J* <t ^ j e{r + l,r + 2,...,m}. ^ ' 

So, for parameter A, we have: 

1 -f 2 (T*_|_j^ 1 -f 2(7* 

Onee the value of A is determined, the parameter a is given by (13.351) . 

In the iterative method, we approximate the optimal solution X* by X'^ and further use B^{X^ys 
singular values {crf}i to replaee those of B^{X*). The resulting parameter seleetion is: 

Qjn An/^n “f y/ {Xndn) T 2An/^n- 


(3.35) 

(3.36) 


(3.38) 
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In this algorithm (TSl-s2 for short), only parameter /r is fixed, satisfying inequality /iG (0,||A|| ^). Its 
algorithm is summarized in Algorithm [2l 
Algorithm 2: TSl-s2 threshold algorithm 
Initialize: Given and parameter fi. 
while NOT eonverged do 

1. = X'^ — — and eompute SVD of F” as F" = [/Diag(cr) ; 

2. Determine the values for A” and aP' by (13.381) . 
then update threshold value = 

3. = GA.^,a^(F-) = (7Diag(i7A"^,a(4)^'^; 

Then n^n + 1. 

end while 


IV. Numerical experiments 

In this seetion, we present numerical experiments to illustrate the effectiveness of our Algorithms: semi- 
adaptive TSl-sl and adaptive TSl-s2, compared with several state-of-art solvers on matrix completion 
problems 0. The comparison solvers include: 

. LMaFit dal, 

. FPCA m\ . 

• sIRLs-q [[2^. 

• IRucLq-M [|20l . 

. LRGeomCG (Ml 

The code LMAFit solves a low-rank factorization model, instead of computing SVD which usually takes 
a big chunk of computation time. Also part of its codes is written in C, same as LRGeomCG. So once this 
method converges, it is the fastest method among all comparisons. All others codes are implemented under 
Matlab environment and involve SVD approximated by fast Monte Carlo algorithms lITOl . ifTTI . FPCA 
is a nuclear norm minimization code, while sIRLs-q and IRucLq-M are iterative reweighted least square 
algorithms for Schatten-q quasi-norm optimizations. LRGeomCG algorithm explores matrix completion 
based on Riemannian optimization. It tries to minimize the least-square distance on the sampling set 
over the Riemannian manifold of fixed-rank matrices. When the rank information is known priori or well 
approximated, this method is efficient and accurate, as shown in these experiments below, especially for 
standard Gaussian matrices. But a drawback of LRGeomCG is that the rank of the manifold is fixed. 
Basically, it is hard for it to handle unknown rank cases. 

In our TSl algorithms, MC SVD algorithm [fTTlI is implemented at each iteration step, same as FPCA. 
We also tried another fast SVD approximation algorithms, but MC SVD is the most suitable one, satisfying 
both speed and accuracy requirements in one iterative algorithm. All our tests were performed on a Lenovo 
desktop: 16 GB of RAM and InteK® Core Quad processor i7-4770 with CPU at 3.40GHz under 64-bit 
Ubuntu system. 

We tested and compared these solvers on low rank matrix completion problems under various conditions, 
including multivariate Gaussian, uniform and -yP distributions. We also tested the algorithms on grayscale 
image recovery from partial observations (image inpainting). 

A. Implementation details 

In the following series of tests, we generated random matrices 

M = MLMlenmxn, 

where matrices Mi and are in spaces 77™^^ and respectively. 

*TS1 matlab codes can be downloaded from https://github.com/zsivine/TSl-algorithms 
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By setting parameter r to be small, we obtain a low rank matrix M with rank at most r. After this 
step, we uniformly random-sampled a subset oo with p entries from M. The following quantities help to 
quantify the diffieulty of a reeovery problem. 

• SR (Sampling ratio): SR = p/mn. 

• FR (Freedom ratio): FR = r{m + n — r)/p, whieh is the freedom of rank r matrix divided by the 
number of measurement. Aeeording to [|2^ , if FR > 1, there are infinite number of matriees with 
rank r and the given entries. 

• Tm (Maximum rank with which the matrix can be recovered): 

m + n-^(m + n)2-4p r , 

rm=l --J (floor function), 

which is defined as the largest rank such that FR < 1. 

The TS1 thresholding algorithms do not guarantee a global minimum in general, similar to non-convex 
schemes in 1-dimensional compressed sensing problems. Indeed we observe that TSl thresholding with 
random starts may get stuck at local minima especially when parameter FR (freedom ratio) is high or 
the matrix completion is difficult. A good initial matrix is important for thresholding algorithms. In 
our numerical experiments, instead of choosing = 0 or random, we set equal to matrix M whose 
elements are as observed on and zero elsewhere. 

The stopping criterion is 

- FT, - ;r-^<tol 

max{|| 1} 


where and X" are numerical results from two contiguous iterative steps, and tol is a moderately 

small number. In all these following experiments, we fix tol = 10“® with maximum iteration steps 1000. 
We also use the relative error 

\\Xo^t-M\\F 


rel.err = 


\\Mh 


(4.39) 


to estimate the closeness of Xopt to M, where X^pt is the ’’optimal” solution produced by all numerical 
algorithms. 

1) Rank estimation: For thresholding algorithms, rank r is the most important parameter, especially 
for our TSl methods, where thresholding value t is determined based on r. If the true rank r is unknown, 
we adopt the rank decreasing estimation method (also called maximum eigengap method) as in [|20l . 
(EH, thereby extending both TSl-si and TSl-s2 schemes to work with an overestimated initial rank 
parameter K. In the following tests, unless otherwise specified, we set K = [l.SrJ. The idea behind this 
estimation method is as follows. Suppose that at step n, our current matrix is X. The eigenvalues of X'^X 
are arranged with descending order and > Xr^^^+i >■■■> Xk+i > 0 is the through K -f 1-th 

eigenvalues of X'^X, where is manually specified minimum rank estimate. Then we compute the 
quotient sequence Xi = \i/Xi+i, i = rmin,---,K. Let 

K = argmin Aj, 


the corresponding index for maximal element of {Aj}. If the eigenvalue gap indicator 

T = X^{K-Train+ ^)/^Xi > 10 , 

i^k 


we adjust our rank estimator from K to K. During numerical simulations, we did this adjustment only 
once for each problem. In most cases, this estimation adjustment is quite satisfactory and the adjusted 
estimate is very close to the true rank r. 
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Parameter test for a with rank known prior 


Parameter test for a with rank estimation 


= 0.6 


I 0.5 
I 0.4 


0.9 


0.1 


0.2 


0.3 


0.8 


10 


♦ ♦ t ♦ ♦ ♦ 


15 



rank 


rank with fr changing from 0.23 to 0.78 


Rank is known a prior 


Rank is estimated 


Fig. 2: Optimal parameter test for semi-adaptive method: TSl-sl 


2) Choice of a: optimal parameter testing for TSl-sl: A major differenee between TSl-sl and TSl-s2 
is the ehoiee of parameter a, whieh infiuenees the behaviour of penalty funetion paf) of TSl. When ’a’ 
tends to zero, the funetion T(X) approaehes the rank. 

We tested TSl-sl on small size low rank matrix completion with different ‘a’ values, varying among 
{0.1,0.5,1,10,100}, for both known rank scheme and the scheme with rank estimation. In these tests, 
M = MlM]^ is a 100 x 100 random matrix, where Ml and Mr are generated under i.i.d standard normal 
distribution. The rank r of M varies from 10 to 22. 

For each value of ‘a’, we conducted 50 independent tests with different M and sample index set oj. 
We declared M to be recovered successfully if the relative error (14.391) was less than 5 x 10“^. The test 
results for known rank scheme and rank estimation scheme are both shown in Figure [2l The success rate 
curves of rank estimation scheme are not as clustered as those of known rank scheme. In order to clearly 
identify the optimal parameter ’a’, we ignored the curve of a = 0.1 in the right figure as it is always below 
all others. The vertical red dotted line there indicates the position where FR =0.6. 

It is interesting to see that for known rank scheme, parameter a = 1 is the optimal strategy, which 
coincides with the optimal parameter setting in ll30l . It is observed that when we use thresholding algorithm 
under transformed LI (TLl) or transformed Schatten-1 (TSl) quasi norm, it is usually optimal to set 
a = 1 with given information of sparsity or rank. However, for the scheme with rank estimation, it is more 
complicated. Based on our tests, if FR <0.6, it is better to set a >100 to reach good performance. On 
the other hand, if FR >0.6, a = 10 is nearly the optimal choice. So for all the following tests, when we 
apply TSl-sl with rank estimation, the parameter a is set to be 



1000, if FR<0.6; 
10, if FR>0.6. 


In applications where FR is not available, we suggest to use a = 10, since its performance is also 
acceptable if FR <0.6. 

B. Completion of Random Matrices 

The ground truth matrix M is generated as the matrix product of two low rank matrices Ml and Mr. 
Their dimensions are mxr and nxr respectively, with r<<min(m,n). In these following experiments, 
except clearly stated, Mr and Mr are generated with multivariate normal distribution AA(/r, S), with /r = 1 
and 


S = {{1 — COV) * -fCOflrxr 

determined by parameter cov. Thus matrix M = has rank at most r. 
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It is known that success recovery is related to FR. The higher FR is, the harder it is to reeover the 
original low rank matrix. In the first bateh of tests, we varied rank r and fixed all other parameters, i.e. 
matrix size (m,n), sampling rate (sr). Thus FR was ehanging along with rank. 

It is observed that the performanee of TSI-sI and TSI-s2 are very different, due to adopting single 
or double thresholds. TSl-s2 uses only one (smooth) thresholding seheme with ehanging parameter a. 
It eonverges faster than TSI-sl when the rank is known, see subseetion IIV-B 1 [ On the other hand, 
TSl-sI utili z es two (smooth and diseontinuous) thresholding sehemes, and is more robust in ease of 
overestimated rank. TSl-sl outperforms TSl-s2 when rank estimation is used in lieu of the true rank 
value, see subsection IIV-B2I IRucL-q method is found to be very robust for varied eovarianee and rank 
estimation, yet it underperforms TSl methods at high FR, even with more eomputing time. Though TSl 
methods rely on the same rank estimation method as IRueL-q, IRueL-q aehieves the best results in the 
absenee of true rank value. A possible reason is that in IRueL-q iterations, the singular values of matrix 
X are eomputed more aeeurately. In TSl, singular values are eomputed by fast Monte Carlo method at 
every iteration. Due to random sampling of Monte Carlo method, there are more errors espeeially at the 
beginning stage of iteration. The resulting matriees may cause less accurate rank estimation. 

1) Matrix completion with known rank: In this subseetion, we implemented all six algorithms under 
the eondition that true rank value is given. They are TSl-sl, TSl-s2, sIRLS-q, IRueL-q, LMaFit and 
LRGeomCG. We skipped FPCA since rank is always adaptively estimated there. 

Gaussian matrices with different ranks: In these tests, matrix M = MlM]^ was generated under 
uneorrelated normal distribution with /i = l. We eondueted tests both on low dimensional matriees with 
m = ?7. = 100 (Table HI) and high dimensional matriees with m = ?7. = 1000 (Table HU). Tests on non-square 
matriees with m^n show similar results. 

In Table HI rank r varies from 5 to 18, while FR inereases from 0.2437 up to 0.8190. For lower rank 
(less than 15), LMaFit is the best algorithm with low relative errors and fast eonvergenee speed. Part of 
the reason is that this method does not involve SVD (singular value decomposition) operations during 
iteration. 

LRGeomCG approaches the performance of LMaFit when r < 10. However, as FR values are above 
0.7, it became hard for LMaFit to find truth low rank matrix M. Its performanee is not as good as stated 
in paper 041 with possible reason that we generate M with mean p equal to 1, instead of 0 in 041 . We 
also tested LRGeomCG with /i = 0 where it has very small relative error and also fast eonvergenee rate. 

It is also notieed that in Table HI the two TS1 algorithms performed very well and remained stable for 
different FR values. At similar order of aeeuraey, the TLls are faster than IRueL-q. 

For large size matriees (m = n = 1000), rank r is varied from 50 to 110, see table HD The sIRLS-q and 
LMaFit only worked for lower FR. IRueL-q ean still produee satisfactory results with relative error around 
10“^, but its iterations took longer time. In EOl . it was carried out by high speed-performance CPU with 
many cores. Here we used an ordinary proeessor with only 4 cores and 8 threads. It is believed that with 
a better machine, IRueL-q will be much faster, sinee parallel computing is embedded in its codes. As 
seen in the table, LRGeomCG is always eonvergent and aehieves almost same aeeuraey with TSl-sl and 
TSl-s2. However, its eomputation time grows fast with inereasing rank. 

A little differenee between the two TS 1 algorithms began to emerge when matrix size is large. Although 
when rank is given, they all performed better than other sehemes, adaptive TSl-s2 is a little faster than 
semi-adaptive TSl-sl. It is believed by ehoosing optimal parameter a, TSl-sl will be improved. The 
parameter a is related to matrix M, i.e. how it is generated, its inner strueture, and dimension. In TSl-s2, 
the value of parameter a does not need to be manually determined. 

Gaussian Matrices with Different Covariance: In this subsection, the rank r, the sampling rate, and 
the freedom ratio FR are fixed. We varied parameter cov to generate eovarianee matrices of multivariate 
normal distribution. 

In Table HID we ehose two rank values, r = 5 and r = 8. It is harder to reeover the original matrix 
M when it is more eoherent. IRueL-q does better in this regime. Its mean eomputing time and relative 
errors are less influeneed by the ehanging cov. Results on large size matriees are shown in Table HYl 
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TABLE I: Comparison of TSl-sl, TSl-s2, sIRLS-q, IRucL-q, LMaFit and LRGeomCG on recovery of 
uncorrelated multivariate Gaussian matrices at known rank, m = n = 100, SR = 0.4, with stopping criterion 
fo/ = 10-^ 


Problem 

TSl-sl 

TSl-s2 

sIRLS 

q* 1 

rank 

FR 

rel.err 

time 

rel.err 

time 

rel.err 

time 

5 

0.2437 

1.89e-05 

0.11 

7.58e-07 

0.13 

7.09e-06 

0.80 

6 

0.2910 

7.13e-06 

0.14 

7.37e-07 

0.15 

8.59e-06 

1.01 

7 

0.3377 

1.39e-05 

0.15 

6.34e-07 

0.17 

8.14e-06 

1.09 

8 

0.3840 

2.04e-05 

0.16 

7.70e-07 

0.20 

1.31e-05 

1.43 

9 

0.4298 

2.08e-05 

0.23 

9.97e-07 

0.25 

2.02e-05 

1.88 

10 

0.4750 

3.26e-05 

0.33 

l.lle-06 

0.34 

1.93e-02 

4.49 

14 

0.6510 

l.lOe-05 

0.53 

1.03e-05 

0.52 

— 

— 

15 

0.6937 

1.05e-05 

0.66 

9.88e-06 

0.64 

— 

— 

16 

0.7360 

3.86e-05 

0.91 

1.79e-05 

0.87 

— 

— 

17 

0.7778 

1.50e-04 

1.03 

7.10e-05 

1.00 

— 

— 

18 

0.8190 

5.63e-04 

1.00 

4.15e-04 

1.00 

— 

— 

Problem 

IRucL 

-q 

LMaFit 

LRGeomCG 

rank 

FR 

rel.err 

time 

rel.err 

time 

rel.err 

time 

5 

0.2437 

7.86e-06 

1.82 

1.96e-06 

0.02 

L03e-06 

0.03 

6 

0.2910 

1.14e-05 

2.15 

2.18e-06 

0.02 

L22e-06 

0.04 

7 

0.3377 

1.28e-05 

2.24 

2.27e-06 

0.03 

L37e-06 

0.05 

8 

0.3840 

3.03e-05 

2.33 

2.67e-06 

0.03 

L66e-06 

0.06 

9 

0.4298 

1.68e-04 

2.38 

3.21e-06 

0.05 

L88e-06 

0.07 

10 

0.4750 

3.21e-04 

2.49 

3.54e-06 

0.08 

L87e-06 

0.08 

14 

0.6510 

3.80e-05 

7.25 

5.74e-06 

0.21 

3.20e-02 

0.34 

15 

0.6937 

5.28e-05 

9.29 

5.87e-02 

0.33 

3.49e-02 

0.47 

16 

0.7360 

7.57e-05 

12.34 

1.44e-01 

0.34 

L91e-01 

0.99 

17 

0.7778 

9.40e-05 

15.31 

3.80e-01 

0.39 

5.73e-01 

0.71 

18 

0.8190 

1.49e-04 

22.27 

4.43e-01 

0.40 

9.17e-01 

0.94 


* Notes: 1. The sIRLS-q iterations did not converge when rank > 14 and FR >0.65. Comparison is skipped over this range. 
2. Matrix M is generated from multivariate normal distribution with mean /i= 1, instead of 0. 

TABLE II: Numerical experiments on recovery of uncorrelated multivariate Gaussian matrices at known 
rank, m = n=1000, SR =0.3. 


Problem 

TSl-sl 

TSl-s2 

sIRLS 

-q 1 

rank 

FR 

rel.err 

time 

rel.err 

time 

rel.err 

time 

50 

0.3250 

5.95e-06 

8.06 

5.88e-06 

6.95 

4.85e-06 

45.20 

70 

0.4503 

6.94e-06 

13.37 

6.78e-06 

11.95 

2.46e-02 

128.65 

90 

0.5730 

7.83e-06 

22.13 

7.77e-06 

18.81 

9.86e-02 

206.32 

110 

0.6930 

L23e-04 

29.91 

3.47e-05 

29.50 

2.27e-01 

282.84 

Problem 

IRucL 

-q 

LMaFit 

LRGeomCG 

rank 

FR 

rel.err 

time 

rel.err 

time 

rel.err 

time 

50 

0.3250 

9.55e-06 

485.30 

L74e-06 

6.04 

Llle-06 

8.31 

70 

0.4503 

3.77e-05 

606.95 

3.54e-02 

23.20 

L50e-06 

20.87 

90 

0.5730 

4.16e-04 

623.37 

L60e-01 

24.94 

2.13e-06 

52.77 

110 

0.6930 

2.41e-03 

640.66 

2.45e-01 

29.19 

3.22e-06 

112.30 


TSl-s2 scheme is much better than TSl-sl, both in relative error and computing time. In small size 
matrix experiments, TSl-s2 is the best among comparisons. 

In Table HVl we fixed rank =30 with cov among {0.1,...,0.7}. TSl-s2 is still satisfactory both in 
accuracy and speed for low covariance (i.e cov <0.6). However, for cov >0.7, relative errors increased 
from 10“® to around 10“^. It is also observed that IRucL-q algorithm is very stable and robust under 
covariance change. 

Matrices from other distributions: We also compare algorithms with other distributions, including (0,1) 
uniform distribution and Chi-square distribution with k = 1 (degree of freedom). All other parameters are 
same as Table HI The results are displayed at Table |V] (uniform distribution) and Table |Vl] (Chi-square 
distribution). Only partial numerical results are showed here with rank r = 7,8,9,10,14,15. From these 
two tables, two TS1 algorithms have satisfying relative errors and stable performance, same as IRuccL-q. 
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TABLE III: Numerical experiments on multivariate Gaussian matrices with varying covariance at known 
rank, m = n= 100, SR =0.4. 


Problem 

TSl-sl 

TSl-s2 

sIRLS 

q 1 

rank 

cor 

rel.err 

time 

rel.err 

time 

rel.err 

time 

5 

0.5 

6.44e-06 

0.17 

5.74e-07 

0.12 

3.35e-02 

3.75 

5 

0.6 

7.28e-06 

0.28 

7.15e-07 

0.13 

1.34e-01 

5.58 

5 

0.7 

3.32e-02 

0.58 

7.65e-07 

0.17 

2.15e-01 

6.16 

8 

0.4 

7.55e-06 

0.34 

7.96e-07 

0.21 

1.43e-01 

6.47 

8 

0.5 

9.84e-03 

0.51 

6.14e-06 

0.19 

2.68e-01 

6.19 

8 

0.6 

3.01e-02 

0.81 

7.71e-06 

0.23 

2.95e-01 

6.26 

8 

0.7 

6.86e-02 

0.86 

7.16e-06 

0.50 

3.33e-01 

6.80 

Problem 

IRucL 

q 

LMaFit 

LRGeomCG 

rank 

cor 

rel.err 

time 

rel.err 

time 

rel.err 

time 

5 

0.5 

8.21e-06 

1.86 

2.48e-02 

0.07 

1.12e-06 

0.06 

5 

0.6 

8.76e-06 

1.85 

4.48e-02 

0.15 

6.98e-02 

0.09 

5 

0.7 

1.37e-05 

1.71 

l.lOe-01 

0.27 

1.22e-01 

0.11 

8 

0.4 

1.92e-05 

2.50 

1.98e-02 

0.18 

5.42e-02 

0.17 

8 

0.5 

1.38e-05 

2.54 

1.21e-01 

0.25 

1.17e-01 

0.17 

8 

0.6 

1.40e-05 

2.51 

1.85e-01 

0.27 

1.83e-01 

0.23 

8 

0.7 

l.lOe-05 

2.35 

2.44e-01 

0.25 

2.21e-01 

0.29 


TABLE rV: Numerical experiments on multivariate Gaussian matrices with varying covariance at known 
rank, m = n= 1000, SR =0.4. 


Problem 

TSl-sl 

TSl-s2 

sIRLS 

■q 1 

rank 

cor 

rel.err 

time 

rel.err 

time 

rel.err 

time 

30 

0.1 

3.07e-06 

9.71 

3.07e-06 

3.98 

4.36e-07 

13.80 

30 

0.2 

2.90e-06 

11.07 

2.94e-06 

3.92 

L28e-05 

33.89 

30 

0.3 

5.54e-03 

26.64 

3.02e-06 

4.13 

6.65e-02 

46.02 

30 

0.4 

1.19e-02 

28.58 

3.08e-06 

4.31 

L08e-01 

50.95 

30 

0.5 

4.76e-02 

34.25 

2.89e-06 

5.89 

L50e-01 

52.64 

30 

0.6 

6.89e-02 

35.69 

2.89e-06 

10.28 

L89e-01 

55.70 

30 

0.7 

8.01e-02 

33.92 

6.99e-04 

20.09 

2.03e-01 

51.03 

Problem 

IRucL 

-q 

LMaFit 

LRGeomCG 

rank 

cor 

rel.err 

time 

rel.err 

time 

rel.err 

time 

30 

0.1 

3.13e-06 

222.90 

L19e-06 

1.83 

6.77e-07 

4.88 

30 

0.2 

3.16e-06 

221.34 

L14e-06 

3.16 

5.68e-07 

8.84 

30 

0.3 

3.05e-06 

218.57 

L21e-06 

6.93 

5.45e-03 

15.45 

30 

0.4 

3.29e-06 

214.52 

2.06e-02 

14.72 

4.82e-02 

19.15 

30 

0.5 

3.12e-06 

209.05 

6.45e-02 

17.34 

8.41e-02 

20.99 

30 

0.6 

3.30e-06 

207.94 

9.09e-02 

18.38 

L42e-01 

21.81 

30 

0.7 

3.15e-06 

210.06 

L15e-01 

16.37 

L67e-01 

21.63 


Eor these two non-Gaussian distributions, it becomes harder to successfully recover low rank matrix for 
LMaEit and LRGeomCG, especially when rank r > 10. 

2) Matrix completion with rank estimation: We conducted numerical experiments on rank estimation 
schemes. The initial rank estimation is given as 1.5r, which is a commonly used overestimate. EPCA 
[|23l is included for comparison, while LRGeomCG and sIRLS-q are excluded. EPCA is a fast and robust 
iterative algorithm based on nuclear norm regularization. 

We considered two classes of matrices: uncorrelated Gaussian matrices with changing rank; correlated 
Gaussian matrices with fixed rank (r = 5,10). The results are shown in Table IVIII and Table IVllll It 
is interesting that under rank estimation, the semi-adaptive TSl-sl fared much better than TSl-s2. In 
low rank and low covariance cases, TSl-sl is the best in terms of accuracy and computing time among 
comparisons. However, in the regime of high covariance and rank, it became harder for TSl methods to 
perform efficient recovery. IRucL-q did the best, being both stable and robust. In the most difficult case, 
at rank = 15 and PR approximately equal to 0.7, IRucL-q can still obtain an accurate result with relative 
error around 10“®. 
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TABLE V: Comparison with random matrices generated from (0,1) uniform distribution. Rank r is given 
and m = n = 100, SR =0.4, with stopping criterion to/= 10“®. 


Problem 

1 TSl-sl 

1 TSl-s2 

sIRLS- 

■q* 1 

rank 

FR 

rel.err 

time 

rel.err 

time 

rel.err 

time 

7 

0.3377 

5.67e-06 

0.16 

5.30e-06 

0.14 

7.30e-06 

1.85 

8 

0.3840 

6.73e-06 

0.18 

6.46e-06 

0.15 

1.96e-02 

3.78 

9 

0.4298 

9.13e-06 

0.24 

8.42e-06 

0.20 

— 

— 

10 

0.4750 

7.62e-06 

0.27 

7.12e-06 

0.20 

— 

— 

14 

0.6510 

2.23e-05 

0.59 

9.24e-06 

0.44 

— 

— 

15 

0.6937 

2.34e-05 

0.81 

1.12e-05 

0.58 

— 

— 

Problem 

IRucL- 

■q 

1 LMaFit 

LRGeomCG 

rank 

FR 

rel.err 

time 

rel.err 

time 

rel.err 

time 

7 

0.3377 

9.55e-06 

5.00 

1.98e-06 

0.05 

1.48e-06 

0.08 

8 

0.3840 

1.08e-05 

4.86 

2.41e-06 

0.06 

1.58e-06 

0.10 

9 

0.4298 

1.57e-05 

6.48 

2.26e-02 

0.13 

2.01e-06 

0.14 

10 

0.4750 

1.80e-05 

7.09 

7.28e-03 

0.11 

2.09e-06 

0.13 

14 

0.6510 

3.75e-05 

13.15 

1.66e-01 

0.18 

1.24e-01 

0.44 

15 

0.6937 

5.58e-05 

17.14 

2.18e-01 

0.16 

1.71e-01 

0.76 


TABLE VI: Comparison with random matrices generated from Chi-square distribution with k = 1 (degree 
of freedom). Rank r is given and m = n = 100, SR =0.4, with stopping criterion to/= 10“®. 


Problem 

1 TSl-sl 

1 TSl-s2 

sIRLS- 

■q* 1 

rank 

FR 

rel.err 

time 

rel.err 

time 

rel.err 

time 

7 

0.3377 

9.09e-06 

0.23 

8.56e-06 

0.20 

L82e-05 

1.84 

8 

0.3840 

1.06e-05 

0.27 

8.31e-06 

0.22 

L69e-02 

2.59 

9 

0.4298 

9.90e-06 

0.30 

8.79e-06 

0.25 

— 

— 

10 

0.4750 

9.52e-06 

0.33 

8.64e-06 

0.28 

— 

— 

14 

0.6510 

1.48e-05 

0.64 

1.20e-05 

0.58 

— 

— 

15 

0.6937 

2.23e-05 

0.83 

1.32e-05 

0.73 

— 

— 

Problem 

IRucL- 

■q 

1 LMaFit 

LRGeomCG 

rank 

FR 

rel.err 

time 

rel.err 

time 

rel.err 

time 

7 

0.3377 

1.26e-05 

5.65 

3.08e-06 

0.04 

L80e-06 

0.05 

8 

0.3840 

1.70e-05 

7.15 

3.29e-06 

0.04 

2.19e-06 

0.06 

9 

0.4298 

2.21e-05 

8.33 

3.75e-06 

0.08 

6.83e-03 

0.11 

10 

0.4750 

2.23e-05 

8.56 

4.25e-06 

0.09 

5.93e-02 

0.14 

14 

0.6510 

5.50e-05 

14.69 

L44e-01 

0.15 

L46e-01 

0.34 

15 

0.6937 

6.61e-05 

17.75 

2.54e-01 

0.15 

3.03e-01 

0.57 


TABLE VII: Numerical experiments for low rank matrix completion algorithms under rank estimation. 
True matrices are uncorrelated multivariate Gaussian, m = n = 100, SR =0.4. 


Problem 

TSl-sl 

TSl-s2 

FPCA 

IRucL-q 

LMaFit 

rank 

FR 

rel.err time 

rel.err time 

rel.err time 

rel.err time 

rel.err time 

10 

0.4750 

7.46e-06 0.31 

2.43e-03 0.38 

2.26e-01 0.91 

L84e-05 3.41 

2.64e-01 0.01 

11 

0.5198 

L04e-05 0.35 

L15e-02 0.52 

2.23e-01 0.88 

2.15e-05 4.09 

2.48e-01 0.01 

12 

0.5640 

9.94e-06 0.44 

7.62e-03 0.54 

2.28e-01 0.92 

2.51e-05 4.46 

2.44e-01 0.01 

13 

0.6078 

3.71e-02 0.80 

5.71e-03 0.68 

2.25e-01 0.84 

3.35e-05 5.61 

2.24e-01 0.02 

14 

0.6510 

7.02e-03 0.82 

L03e-03 0.65 

2.23e-01 0.88 

3.97e-05 6.41 

2.19e-01 0.01 

15 

0.6937 

4.96e-03 0.95 

2.88e-03 0.92 

2.18e-01 0.88 

4.82e-05 7.86 

2.12e-01 0.02 


TABLE VIII: Numerical experiments on low rank matrix completion algorithms under rank estimation. 
True matrices are multivariate Gaussian with different covariance, m = n = 100, and SR =0.4. 


Problem 

TSl-s 

1 

1 TSl-s2 

1 FPCA 

IRucL 

-q 

1 LMaFit 

rank 

cor 

rel.err 

time 

rel.err 

time 

rel.err 

time 

rel.err 

time 

rel.err 

time 

5 

0.5 

5.49e-06 

0.20 

6.77e-02 

0.86 

L61e-05 

0.12 

7.50e-06 

2.07 

L24e-01 

0.01 

5 

0.6 

5.45e-06 

0.20 

7.74e-02 

0.91 

L69e-05 

0.11 

6.93e-06 

1.76 

9.12e-02 

0.01 

5 

0.7 

5.25e-06 

0.25 

L04e-01 

1.33 

L53e-05 

0.12 

4.71e-04 

2.06 

6.60e-02 

0.01 

10 

0.5 

LlOe-05 

0.65 

L17e-01 

1.14 

L21e-01 

0.97 

L76e-05 

3.35 

9.66e-02 

0.01 

10 

0.6 

L61e-02 

0.76 

L32e-01 

1.04 

L02e-01 

0.86 

2.72e-05 

4.26 

7.33e-02 

0.01 

10 

0.7 

9.14e-02 

0.91 

L55e-01 

0.93 

9.11e-02 

0.82 

7.12e-04 

4.59 

5.06e-02 

0.01 
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Original image 




Sampie image with noise 



LMaFit-inc 



IRucL q 



TS1-S2 



Fig. 3: Image inpainting experiments with SR = 0.3,cr = 0.15. 


C. Image inpainting 

As in [l20l . [[28l . we eondueted grayseale image inpainting experiments to reeover low rank images from 
partial observations, and eompare with IReuL-q and LMaFit algorithms. The ‘boat’ image (see Figure [3]) 
is used to produee ground truth as in [l20ll with rank equal to 40 and at 512 x 512 resolution. Different 
levels of noisy disturbanees are added to the original image Mo by the formula 


M 


Mq + (T 


WMqWf 

Ikll^ 




where the matrix e is a standard Gaussian. 

Here we only applied seheme TSl-s2. For IRueL-q, we followed the setting in EOl by ehoosing a = 0.9 
and A = 10“^(j. Both fixed rank ( LMaFit-fix ) and inereased rank (LMaFit-ine) sehemes are implemented 
for LMaFit. We took fixed rank r = 40 for TSl-s2, LMaFit-fix and IRueL-q. 

Computational results are in Table UX] with sampling ratios varying among {0.3,0.4,0.5} and noise 
strength a in {0.01,0.05,0.10,0.15,0.20,0.25}. The performanee for eaeh algorithm is measured in CPU 
time, PSNR (peak-signal noise ratio), and MSE (mean squared error). Here we foeus more on PSNR 
values and plaeed the top 2 in bold for eaeh experiment. We observed that IRueL-q and TSl-s2 fared 
about the same. Either one is better than EMaEit in most oases. 


V. Conclusion 

We presented the transformed Sehatten-1 penalty (TSl), and derived the olosed form thresholding 
representation formula for global minimizers of TS1 regularized rank minimization problem. We studied 
two adaptive iterative TSl sehemes (TSl-si and TSl-s2) eomputationally for matrix eompletion in 
oomparison with several state-of-art methods, in partioular IRuoE-g. In ease of low rank matrix reoovery 
under known rank, TSl-s2 performs the best in aeouraey and oomputational speed. In low rank matrix 
reoovery under rank estimation, TSl-si is almost on par with IRuoE-q exoept when both the matrix 
oovarianoe and rank rise to eertain level. In future work, we shall study rank estimation teohniques to 
further improve on TSl-si and explore other applioations for TSl penalty. 

























18 


TABLE IX: Numerical experiments on boat image inpainting with algorithms TSl, IRcuL-q and LMaFit 
under different sampling ratio and noise levels. 


Problem 

1 TSl-s2 


IRucL 

-q 

LMaEit 

-inc 

1 LMaEit-fix | 

SR 

a 

Time 

PSNR 

MSE 

Time 

PSNR 

MSE 

Time PSNR 

MSE 

Time 

; PSNR MSE 

0.3 

0.01 

27.23 

44.21 

3.79e-5 

85.97 

43.28 

4.70e-5 

5.70 

32.80 

5.25e-4 

2.17 

45.02 

3.15e-5 

0.3 

0.05 

27.81 

30.55 

8.82e-4 

58.25 

29.55 

Llle-3 

6.00 

29.10 

L23e-3 

2.81 

29.28 

L18e-3 

0.3 

0.10 

29.21 

24.89 

3.24e-3 

24.26 

24.99 

3.17e-3 

5.59 

19.74 

L06e-2 

5.74 

18.52 

L41e-2 

0.3 

0.15 

26.37 

22.57 

5.54e-3 

27.61 

22.74 

5.33e-3 

5.46 

16.64 

2.17e-2 

4.84 

15.98 

2.52e-2 

0.3 

0.20 

26.75 

20.89 

8.14e-3 

24.45 

21.05 

7.85e-3 

5.95 

14.68 

3.41e-2 

3.52 

14.03 

3.95e-2 

0.3 

0.25 

26.92 

19.60 

LlOe-2 

23.75 

19.75 

L06e-2 

5.52 

12.91 

5.12e-2 

1.85 

12.73 

5.33e-2 

0.4 

0.01 

26.29 

44.30 

3.71e-5 

80.19 

43.25 

4.74e-5 

6.53 

44.84 

3.28e-5 

2.93 

45.02 

3.15e-5 

0.4 

0.05 

26.05 

30.58 

8.75e-4 

63.20 

29.39 

L15e-3 

4.62 

29.09 

L23e-3 

3.12 

27.91 

L62e-3 

0.4 

0.10 

26.08 

24.74 

3.35e-3 

32.58 

24.86 

3.27e-3 

6.44 

19.97 

LOle-2 

8.00 

19.19 

L21e-2 

0.4 

0.15 

26.34 

22.57 

5.53e-3 

26.30 

22.72 

5.35e-3 

5.52 

16.78 

2.10e-2 

2.86 

16.21 

2.40e-2 

0.4 

0.20 

29.04 

20.89 

8.15e-3 

20.73 

21.08 

7.81e-3 

5.44 

14.47 

3.58e-2 

2.25 

14.43 

3.61e-2 

0.4 

0.25 

28.84 

19.56 

Llle-2 

20.48 

19.68 

L08e-2 

5.70 

12.79 

5.26e-2 

2.35 

12.57 

5.54e-2 

0.5 

0.01 

27.76 

44.26 

3.75e-5 

82.42 

43.30 

4.67e-5 

5.04 

34.50 

3.55e-4 

2.79 

45.01 

3.15e-5 

0.5 

0.05 

27.89 

30.54 

8.82e-4 

64.19 

29.47 

L13e-3 

5.81 

28.63 

L37e-3 

2.79 

29.62 

L09e-3 

0.5 

0.10 

29.56 

24.80 

3.31e-3 

30.50 

24.94 

3.21e-3 

5.78 

19.92 

L02e-2 

3.54 

19.09 

L23e-2 

0.5 

0.15 

26.21 

22.59 

5.51e-3 

24.24 

22.74 

5.32e-3 

5.71 

16.73 

2.12e-2 

2.67 

16.32 

2.33e-2 

0.5 

0.20 

28.01 

20.89 

8.14e-3 

22.51 

21.07 

7.82e-3 

4.44 

15.67 

2.71e-2 

2.42 

14.38 

3.65e-2 

0.5 

0.25 

29.86 

19.52 

L12e-2 

18.32 

19.71 

L07e-2 

5.54 

12.62 

5.48e-2 

3.24 

12.74 

5.32e-2 
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Appendix 

Proof of Ky Fan k-norm inequality 

m 

Proof: Since X = f/Diag((T)l^^, the (j,A;)-th entry of matrix X is Xj^k = X] 

i=l 

Thus, we have 

k km 

trk(x) = E Eu = E E ^*E-,*E* 

j=l j=li=l 

m k m , , 

= E E = E \ 

i=lj=\ 1=1 

where the weight for the singular value ai is defined as: 

k 

i=i 


(1.40) 


(1.41) 


Notice that, 

k 

I <EIU<IIL<I < IT(<.'<)II2||iL<.<)I|2 < 1. 

i=i 

where U{:,i) and are the Ath column vectors for U and V. Also for weights 

m m k km 

El«>f I < EE IU.IIL<I = EEIU.IIL.I 

2=1 ^=lj;=l ^=1^=1 

<E \mj,:)h \\V{j,-.)h<k, 

i=i 

where U{j,:) and V{j,:) are the j-th row vectors for U and V, respectively. 


(1.42) 


(1.43) 
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All the m weights are bounded by 1, with absolute sum at most k<m. Note that a/s are in deereasing 
order. By equation (11.401) . we have, for all k = l,2,...,m, 

m k 

trk(X) <y^q-^ = trk(£>) = ||X||j^fc- 

i=l i=l 

Next, we prove the seeond part of the lemma — equality eondition, by mathematieal induetion. Suppose 
that for a given matrix X, trk(X) =trk(£>), V fc = l,...,m. Here, it is eonvenient to define Xi = aiUiV^, 
where Vi (JJi) is the i-\h eolumn veetor of V (U). Then matrix X ean be deeomposed as the sum of r 

r 

rank-1 matrices, X = ^Xi. 

i=\ 

When k = l, according to tri{X) =tri{D) and the proof above, we know that 

= l andtu-^^ = 0 for i = 2,...,m. 

By the definition of weights in (11.411) . we have = f7i il/i i = 1. Since U and V are both unitary 
matrices, we have: 

t/i,i = Vl,i = ±1 ; Uij = Ujj = Vj = Vjj = 0 for j 7 ^ 1. 

Then vectors Vi (Vi) is the first standard basis vector in space 3?”^ (3?"^). The matrix Xi = ai[/iV/' is 
diagonal 

0 


^1 = 


0 


For any index i, 1 <i < k — 1, suppose that 

Vi,i = = ±1; Vij = Uj^i = Vij = Vj^i = 0 for any index j^i. 

Then matrix Xi = aJJiV^, with l<i<k — l, is diagonal and can be expressed as 

’0 


(1.44) 


a:,; = 


0 


O'* 


(Ath row) 


0 

Under those conditions, let us consider the case with index i = k. Clearly, we have trk{X)=trk{D). 
Similarly as before, thanks to the formula (11.401) and inequalities (11.421) and (11.431) . it is true that 

wf'^ = 1 for i = and = Q for i>k. 

Furthermore, by definition (11.411) . wl ='^Uj^kVj^k = Uk,kVk,k = ^- This is because Uj^k = Vj^k = 0 for 

i=i 


index j < k, by the assumption (11.441) . Thus vectors Uk and 14 are also standard basis vectors with the 
k-th entry to be ±1. Then 

'0 


Xk = akUkV^ = 




{k-th row) 


0 


mxn 
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Finally, we prove that all matriees are diagonal. So the original matrix X = ^Xi is equal 

i=l 

to the diagonal matrix D. The other direetion is obvious. We finish the proof. ■ 
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